The discovery of a new drug, which is essentially the process of identifying a new molecule or compound, presents both scientific and analytical challenges. Pharmaceutical companies currently have to deal with long cycle times from drug discovery to commercialisation and increased risk due to regulatory pressures. The availability of powerful technology/solutions enables researchers to overcome these challenges and has therefore become an essential component of the drug discovery and development process.
Pharmaceutical companies are increasingly becoming aware of the potential of genome research with many of the large pharmaceutical companies realising that the route to the "drugs of tomorrow" increasingly begins with the genes discovered today. High throughput sequencing and automated genotyping technologies have resulted in many important genetic discoveries, paving the way for new technologies in drug development. This new genomics revolution will undoubtedly change the face of biomedical research.
In the pharma industry, bioinformatics plays an important role in all the three key workspaces of discovery, preclinical and clinical trials/research. The main objective has is to develop goal-centric solutions and tools in each of these three workspaces applicable to the pharmaceutical industry. Further this has facilitated in integrating storage, subsequent querying, analysis, and unified visualisation of data.
Discovery
Proteins are transcribed from genes and are responsible for carrying out all the biological functions in the human body. Faulty genes or proteins are often responsible for causing diseases. Large-scale experimental analyses are essential to decipher and detect the processes occurring in most human diseases (diagnostics) and thus identify the biological targets. Only then can this information be used to discover a compound or compounds that act specifically, selectively on the particular target so that the disease may be effectively treated (therapeutics). High throughput genomic and post-genomic technologies like sequencing, proteomics and microarrays are used to discover the genes/proteins involved in a particular disease rapidly and precisely. However, these high-throughput techniques that are responsible for a revolution in the drug discovery process present exciting and difficult data management and integration issues.
Data warehouse for sequencing data
The data warehouse comprises different data marts for the entire sequencing pipeline including quality tracking, laboratory information management system (LIMS) for managing the laboratory workflow and sequence assembly. The LIMS tracks various phases of lab workflow such as addition of reagents to the samples (DNA fragments) that are to be sequenced, transferring the samples from one plate to another, and processing of the samples at several decks (machines). The warehouse also integrates data related to the sequencing process, generates project management reports based on read lengths and quality scores, and links sequence quality to various factors of lab workflow (materials, plates, decks involved in the sequencing process and temperature/humidity of various phases of the workflow), enabling users to identify the factor(s) that led to drop/rise in the sequencing quality.
. A tool to identify mutations from sequencing data - Mutation Viewer: Mutation Viewer identifies and visualises mutations (Insertion, Deletion, Substitution or InDel) in a particular gene or a group of genes responsible for causing the disease in question, in patient samples, by comparing the observed sequence with RefSeq sequences available from NCBI. Mutations present over important domains are seen visually. Known SNPs can also be viewed.
. A JAVA based laboratory information management system (LIMS) for proteomics and microarrays: LIMS for microarray provides a complete solution for management of microarray experiments within a secure environment. The schema for the LIMS is as shown in the diagram.
LIMS for Proteomics seamlessly integrates the workflow from initial sample characterization through gel analysis to protein identification. The workflow supports 2D-gels including DIGE and mass spectrometry. The LIMS manages multiple roles and users, obtains information about the lab element the user is working on and generates reports.An annotation server to integrate publicly available information about genes from various data sources
The annotation server contains annotation information downloaded from various publicly available gene annotation databases such as UniGene, Entrez Gene, HomoloGene, UniSTS, dbSNP, BioCarta, PubMed and Gene Ontology (GO). It uses this information to annotate genes corresponding to probes on the microarray.
. A C++ application to analyze experimental data in context with annotations: This application interfaces with the gene annotation server described above facilitating analysis of global gene expression data generated using microarrays. It allows a biologist to make complex queries and understand the underlying biological process. This tool facilitates the identification of genes that are co-regulated, identify genes that are involved in a particular process and visualize a network of genes having similar expression profiles. An interactive visualization tools like 3-D scatter plot for analyzing samples involved in microarray experiments: This is basically a visualization tool to view genomics data. The genomic data is specified in an XML format and then rendered either as a 3D Scatter plot and/or a 2D bar chart.
Preclinical research
Once a candidate drug is obtained, its potential to develop into a drug depends on its efficacy in humans. However before a candidate drug/molecule can be tested on humans, the researchers have to demonstrate its effectiveness and lack of toxicity in animals. Preclinical research is carried out using experimental animal models prior to performing clinical trials on humans. These trials generally produce sizeable data in shorter time periods than clinical studies exacerbating the data management problem. Further, providing visibility to the researcher over the entire supply chain involving animal breeding, cohorts, samples etc., is important to make timely decisions during the studies. Persistent Systems has developed a laboratory information management system for managing pre-clinical trials experimental data and facilitating its exchange among different researchers and research centers.
Clinical research
In clinical research the researcher carries out studies on human beings as opposed to animal models in the preclinical phase. In this phase, the drug is tested against healthy subjects, patients suffering from the target disease with the aim of studying the side-effects, dosage, etc prior to and after commercialization of the drug. In order to aid researchers in clinical research, Persistent Systems has developed several tools such as:
. A web based data management tool for tissue banks - caTISSUE: caTISSUE is a web-based informatics system that helps in collection, processing, storage, and distribution of human specimens for correlative scientific cancer research. It keeps track of multiple specimens from the participant tracks refined materials (RNA, DNA, protein) used for molecular analysis, and annotates bio-specimens with accumulating experimental data. caTISSUE Core also helps in reducing the functional complexity of the bio-specimen banks in collecting, processing, storing, and distribution of human specimens for correlative scientific cancer research. A data warehouse to assist researchers in comparing the analyzed clinical datasets (symptoms/phenotype) with the genetic information (genotype).
The warehouse contains information about SNPs from dbSNP, clinical trail data and genetic information. This comprehensive system enables researchers to analyze pharmacogenomic and clinical data in-house.
. Clinical trial management system: This system aids clinical trial coordinators in tracking and maneuvering the performance of a clinical study. It helps to predict the final number of recruited subjects for a study/country/center based on the current recruitment rate and allows the user to view this information using graphs, plots and charts. The system also allows trial coordinators to make projections.
Conclusion
The information management systems and analysis tools developed by Persistent Systems have helped medical researchers and those in the pharmaceutical industry to effectively manage and analyze large volumes of data. This can help speed up the process of drug discovery and development.
(The authors: Armaity Davierwala is Life Sciences Consultant. Mushtaq Ahmed is Bioinformatics Consultant of Persistent Systems Pvt. Ltd.)